Effects of Sample Size on Differential Gene Expression, Rank Order and Prediction Accuracy of a Gene Signature

نویسندگان

  • Cynthia Stretch
  • Sheehan Khan
  • Nasimeh Asgarian
  • Roman Eisner
  • Saman Vaisipour
  • Sambasivarao Damaraju
  • Kathryn Graham
  • Oliver F. Bathe
  • Helen Steed
  • Russell Greiner
  • Vickie E. Baracos
چکیده

Top differentially expressed gene lists are often inconsistent between studies and it has been suggested that small sample sizes contribute to lack of reproducibility and poor prediction accuracy in discriminative models. We considered sex differences (69♂, 65 ♀) in 134 human skeletal muscle biopsies using DNA microarray. The full dataset and subsamples (n = 10 (5 ♂, 5 ♀) to n = 120 (60 ♂, 60 ♀)) thereof were used to assess the effect of sample size on the differential expression of single genes, gene rank order and prediction accuracy. Using our full dataset (n = 134), we identified 717 differentially expressed transcripts (p<0.0001) and we were able predict sex with ~90% accuracy, both within our dataset and on external datasets. Both p-values and rank order of top differentially expressed genes became more variable using smaller subsamples. For example, at n = 10 (5 ♂, 5 ♀), no gene was considered differentially expressed at p<0.0001 and prediction accuracy was ~50% (no better than chance). We found that sample size clearly affects microarray analysis results; small sample sizes result in unstable gene lists and poor prediction accuracy. We anticipate this will apply to other phenotypes, in addition to sex.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Forecasting copper price using gene expression programming

Forecasting the prices of metals is important in many aspects of economics. Metal prices are also vital variables in financial models for revenue evaluation, which forms the basis of an effective payment regime using resource policymakers. According to the severe changes of the metal prices in the recent years, the classic estimation methods cannot correctly estimate the volatility. In order to...

متن کامل

Multivariate Feature Extraction for Prediction of Future Gene Expression Profile

Introduction: The features of a cell can be extracted from its gene expression profile. If the gene expression profiles of future descendant cells are predicted, the features of the future cells are also predicted. The objective of this study was to design an artificial neural network to predict gene expression profiles of descendant cells that will be generated by division/differentiation of h...

متن کامل

Multivariate Feature Extraction for Prediction of Future Gene Expression Profile

Introduction: The features of a cell can be extracted from its gene expression profile. If the gene expression profiles of future descendant cells are predicted, the features of the future cells are also predicted. The objective of this study was to design an artificial neural network to predict gene expression profiles of descendant cells that will be generated by division/differentiation of h...

متن کامل

Effects of Over-Expression of LOC92912 Gene on Cell Cycle Progression

Background: We had previously identified the genes involved in squamous cell carcinoma of the head and neck using differential display and DNA microarray techniques. We also reported the first analytical study on a novel human gene called LOC92912, which was identified by differential display as a gene up-regulated in such carcinomas. LOC92912, which is a putative member of the E2 ubiquitin con...

متن کامل

Prediction of Blasting Cost in Limestone Mines Using Gene Expression Programming Model and Artificial Neural Networks

The use of blasting cost (BC) prediction to achieve optimal fragmentation is necessary in order to control the adverse consequences of blasting such as fly rock, ground vibration, and air blast in open-pit mines. In this research work, BC is predicted through collecting 146 blasting data from six limestone mines in Iran using the artificial neural networks (ANNs), gene expression programming (G...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2013